89 research outputs found

    Learning Accurate and Interpretable Decision Rule Sets from Neural Networks

    Full text link
    This paper proposes a new paradigm for learning a set of independent logical rules in disjunctive normal form as an interpretable model for classification. We consider the problem of learning an interpretable decision rule set as training a neural network in a specific, yet very simple two-layer architecture. Each neuron in the first layer directly maps to an interpretable if-then rule after training, and the output neuron in the second layer directly maps to a disjunction of the first-layer rules to form the decision rule set. Our representation of neurons in this first rules layer enables us to encode both the positive and the negative association of features in a decision rule. State-of-the-art neural net training approaches can be leveraged for learning highly accurate classification models. Moreover, we propose a sparsity-based regularization approach to balance between classification accuracy and the simplicity of the derived rules. Our experimental results show that our method can generate more accurate decision rule sets than other state-of-the-art rule-learning algorithms with better accuracy-simplicity trade-offs. Further, when compared with uninterpretable black-box machine learning approaches such as random forests and full-precision deep neural networks, our approach can easily find interpretable decision rule sets that have comparable predictive performance.Comment: Published at AAAI 202

    GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

    Full text link
    In the context of the rapid development of large language models, we have meticulously trained and introduced the GujiBERT and GujiGPT language models, which are foundational models specifically designed for intelligent information processing of ancient texts. These models have been trained on an extensive dataset that encompasses both simplified and traditional Chinese characters, allowing them to effectively handle various natural language processing tasks related to ancient books, including but not limited to automatic sentence segmentation, punctuation, word segmentation, part-of-speech tagging, entity recognition, and automatic translation. Notably, these models have exhibited exceptional performance across a range of validation tasks using publicly available datasets. Our research findings highlight the efficacy of employing self-supervised methods to further train the models using classical text corpora, thus enhancing their capability to tackle downstream tasks. Moreover, it is worth emphasizing that the choice of font, the scale of the corpus, and the initial model selection all exert significant influence over the ultimate experimental outcomes. To cater to the diverse text processing preferences of researchers in digital humanities and linguistics, we have developed three distinct categories comprising a total of nine model variations. We believe that by sharing these foundational language models specialized in the domain of ancient texts, we can facilitate the intelligent processing and scholarly exploration of ancient literary works and, consequently, contribute to the global dissemination of China's rich and esteemed traditional culture in this new era.Comment: 22pages,0 figur

    Spatial and temporal regeneration patterns within gaps in the primary forests vs. secondary forests of Northeast China

    Get PDF
    Forest gaps play an important role during forest succession in temperate forest ecosystems. However, the differences in spatial distribution and replacement patterns of woody plants (trees and shrubs) between primary and secondary forests remain unclear during the gap-filling processes, especially for temperate forests in Northeast China. We recorded 45,619 regenerated trees and shrubs in young gaps (<10 years), old gaps (10~20 years), and closed forest stands (i.e., filled gaps) in the primary broadleaved Korean pine (Pinus koraiensis Sieb. Rt Zucc.) forests vs. secondary forests (degraded from primary forests). The gap-filling processes along horizontal (Cartesian coordinate system) and vertical (lower layer: 0~5 m, medium layer: 5~10 m, and upper layer: >10 m) dimensions were quantified by shade tolerance groups of trees and shrubs. We found that gap age, competition between species, and pre-existing regeneration status resulted in different species replacement patterns within gaps in primary vs. secondary forests. Gap formation in both primary and secondary forests increased species richness, with 33, 38, 39, and 41 in the primary closed stands, primary forest gaps, secondary closed stands, and secondary forest gaps, respectively. However, only 35.9% of species in primary forest gaps and 34.1% in secondary forest gaps successfully reached the upper layer. Based on the importance values (IVs) of tree species across different canopy heights, light-demanding trees in the upper layer of the secondary forests were gradually replaced by intermediate and shade-tolerant trees. In the primary forests, Korean pine exhibited intermittent growth patterns at different canopy heights, while it had continuous regeneration along vertical height gradients in the secondary forests. The differences in Korean pine regeneration between the primary and secondary forests existed before gap formation and continued during the gap-filling processes. The interspecific competition among different tree species gradually decreased with increasing vertical height, and compared to the primary forests, the secondary forests showed an earlier occurrence of competition exclusion within gaps. Our findings revealed the species replacement patterns within gaps and provided a further understanding of the competition dynamics among tree species during the gap-filling processes

    Reveal a hidden highly toxic substance in biochar to support its effective elimination strategy

    Get PDF
    With the aim to develop optimized biochar with minimal contaminants, it is important significance to broaden the understanding of biochar. Here, we disclose for the first time, a highly toxic substance (metal cyanide, MCN, such as KCN or NaCN) in biochar. The cyanide ion (CN−) content in biochar can be up to 85,870 mg/kg, which is determined by the inherent metal content and type in the biomass with K and Na increasing and Ca, Mg and Fe decreasing its formation. Density functional theory (DFT) analysis shows that unstable alkali oxygen-containing metal salts such as K2CO3 can induce an N rearrangement reaction to produce for example, KOCN. The strong reducing character of the carbon matrix further converts KOCN to KCN, thus resulting biochar with high risk. However, the stable Mg, Ca and Fe salts in biomass cannot induce an N rearrangement reaction due to their high binding energies. We therefore propose that high valent metal chloride salts such as FeCl3 and MgCl2 could be used to inhibit the production of cyanide via metal interactive reaction. These findings open a new point of view on the potential risk of biochar and provide a mitigation solution for biochar’s sustainable application
    • …
    corecore